Imputations for High Missing Rate Data in Covariates Via Semi-supervised Learning Approach
نویسندگان
چکیده
Advancements in data collection techniques and the heterogeneity of resources can yield high percentages missing observations on variables, such as block-wise data. Under missing-data scenarios, traditional methods simple average, k-nearest neighbor, multiple, regression imputations may lead to results that are unstable or unable be computed. Motivated by concept semi-supervised learning, we propose a novel approach with which fill values covariates have rates. Specifically, consider nonmissing subjects any covariate unlabeled labeled target outputs, respectively, treat their corresponding responses inputs. This innovative setting allows us impute large number without imposing model assumptions. In addition, resulting imputation has closed form for continuous covariates, it calculated efficiently. An analogous procedure is applicable discrete covariates. We further employ nonparametric show theoretical properties imputed Simulation studies an online consumer finance example presented illustrate usefulness proposed method.
منابع مشابه
Missing Data Imputation for Supervised Learning
This paper compares methods for imputing missing categorical data for supervised learning tasks. The ability of researchers to accurately fit a model and yield unbiased estimates may be compromised by missing data, which are prevalent in survey-based social science research. We experiment on two machine learning benchmark datasets with missing categorical data, comparing classifiers trained on ...
متن کاملSemi-supervised Data Representation via Affinity Graph Learning
We consider the general problem of utilizing both labeled and unlabeled data to improve data representation performance. A new semi-supervised learning framework is proposed by combing manifold regularization and data representation methods such as Non negative matrix factorization and sparse coding. We adopt unsupervised data representation methods as the learning machines because they do not ...
متن کاملAnalysis of presence-only data via semi-supervised learning approaches
Presence-only data occur in classification, which consist of a sample of observations from presence class and a large number of background observations with unknown presence/absence. Since absence data are generally unavailable, conventional semisupervised learning approaches are no longer appropriate as they tend to degenerate and assign all observations to presence class. In this article, we ...
متن کاملSemi - supervised Learning Methods for Data Augmentation
The original goal of this project was to investigate the extent to which data augmentation schemes based on semi-supervised learning algorithms can improve classification accuracy in supervised learning problems. The objectives included determining the appropriate algorithms, customising them for the purposes of this project and providing their Matlab implementations. These algorithms were to b...
متن کاملData Selection for Semi-Supervised Learning
The real challenge in pattern recognition task and machine learning process is to train a discriminator using labeled data and use it to distinguish between future data as accurate as possible. However, most of the problems in the real world have numerous data, which labeling them is a cumbersome or even an impossible matter. Semi-supervised learning is one approach to overcome these types of p...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
ژورنال
عنوان ژورنال: Journal of Business & Economic Statistics
سال: 2021
ISSN: ['1537-2707', '0735-0015']
DOI: https://doi.org/10.1080/07350015.2021.1922120